35 research outputs found
Contextual equivalence in lambda-calculi extended with letrec and with a parametric polymorphic type system
This paper describes a method to treat contextual equivalence in polymorphically typed lambda-calculi, and also how to transfer equivalences from the untyped versions of lambda-calculi to their typed variant, where our specific calculus has letrec, recursive types and is nondeterministic. An addition of a type label to every subexpression is all that is needed, together with some natural constraints for the consistency of the type labels and well-scopedness of expressions. One result is that an elementary but typed notion of program transformation is obtained and that untyped contextual equivalences also hold in the typed calculus as long as the expressions are well-typed. In order to have a nice interaction between reduction and typing, some reduction rules have to be accompanied with a type modification by generalizing or instantiating types
Word Discovery in Visually Grounded, Self-Supervised Speech Models
We present a method for visually-grounded spoken term discovery. After
training either a HuBERT or wav2vec2.0 model to associate spoken captions with
natural images, we show that powerful word segmentation and clustering
capability emerges within the model's self-attention heads. Our experiments
reveal that this ability is not present to nearly the same extent in the base
HuBERT and wav2vec2.0 models, suggesting that the visual grounding task is a
crucial component of the word discovery capability we observe. We also evaluate
our method on the Buckeye word segmentation and ZeroSpeech spoken term
discovery tasks, where we outperform all currently published methods on several
metrics.Comment: submitted to Interspeech 202
Phoneme Segmentation Using Self-Supervised Speech Models
We apply transfer learning to the task of phoneme segmentation and
demonstrate the utility of representations learned in self-supervised
pre-training for the task. Our model extends transformer-style encoders with
strategically placed convolutions that manipulate features learned in
pre-training. Using the TIMIT and Buckeye corpora we train and test the model
in the supervised and unsupervised settings. The latter case is accomplished by
furnishing a noisy label-set with the predictions of a separate model, it
having been trained in an unsupervised fashion. Results indicate our model
eclipses previous state-of-the-art performance in both settings and on both
datasets. Finally, following observations during published code review and
attempts to reproduce past segmentation results, we find a need to disambiguate
the definition and implementation of widely-used evaluation metrics. We resolve
this ambiguity by delineating two distinct evaluation schemes and describing
their nuances.Comment: Accepted to SLT 202